1 Introduction

Socioeconomics (also known as social economics) is the social science that studies how economic activity affects and is shaped by social processes. In general it analyzes how societies progress, stagnate, or regress because of their local or regional economy, or the global economy. Various parameters such as GDP per capita, population growth and life expectancy have been a reliable means used to measure the progress of societies.

1.1 Steps taken

  1. Data cleansing (if required)
  2. Data transformation
  3. Visualisations

Here we will be analysing the socioeconomic conditions across five continents from a period of 1952 to 2007. The dataset under investigation is Gapminder dataset for further details please refer to Gapminder. Features present in the analysis and their definitions

Feature Definition
Country Name of the country
Continent Name of the continent
Year Year observation was recorded
lifeExp Life expectancy is a statistical measure of the average time an organism is expected to live, based on the year of their birth, their current age and other demographic factors including sex.
pop Population of the country in specific year
gdpPercap GDP - per capita (PPP) compares GDP on a purchasing power parity basis divided by population as of 1 July for the same year.

Lets have a quick look at the data and we can see all of the six features and their associated values among top five rows.

Gapminder data table
country continent year lifeExp pop gdpPercap
Afghanistan Asia 1952 28.801 8425333 779.4453
Afghanistan Asia 1957 30.332 9240934 820.8530
Afghanistan Asia 1962 31.997 10267083 853.1007
Afghanistan Asia 1967 34.020 11537966 836.1971
Afghanistan Asia 1972 36.088 13079460 739.9811

In order to have a higher level of understanding of this data having a look at summary table is a crucial step. The key column represents the list of countries present in the data. In rest of the columns respective statistical mean, median and missing values are represented. We can see it is a clean data with zero missing values.

key mean median missing
continent 0
country 0
gdpPercap 7215.33 3531.85 0
lifeExp 59.47 60.71 0
pop 29601212.32 7023595.50 0
year 1979.50 1979.50 0

The range of data can be analysed in all three features namely lifeExp, pop and gdpPercap

lifeExp
  Min. 1st Qu. Median Mean 3rd Qu. Max.
Africa 23.6 42.37 47.79 48.87 54.41 76.44
Americas 37.58 58.41 67.05 64.66 71.7 80.65
Asia 28.8 51.43 61.79 60.06 69.51 82.6
Europe 43.58 69.57 72.24 71.9 75.45 81.76
Oceania 69.12 71.2 73.66 74.33 77.55 81.24
pop
  Min. 1st Qu. Median Mean 3rd Qu. Max.
Africa 60010 1342000 4579000 9916000 10800000 1.35e+08
Americas 662800 2962000 6228000 24500000 18340000 301100000
Asia 120400 3844000 14530000 77040000 46300000 1.319e+09
Europe 148000 4332000 8551000 17170000 21800000 82400000
Oceania 1995000 3199000 6403000 8875000 14350000 20430000
gdpPercap
  Min. 1st Qu. Median Mean 3rd Qu. Max.
Africa 241.2 761.2 1192 2194 2377 21950
Americas 1202 3428 5466 7136 7830 42950
Asia 331 1057 2647 7902 8549 113500
Europe 973.5 7213 12080 14470 20460 49360
Oceania 10040 14140 17980 18620 22210 34440

Let us fit a linear model and take an overview of the existing relationship among gdpPercap and lifeExp.

  Estimate Std. Error t value Pr(>|t|)
(Intercept) -19277 914.1 -21.09 6.745e-88
lifeExp 445.4 15.02 29.66 3.566e-156
Fitting linear model: gdpPercap ~ lifeExp
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
1704 8006 0.3407 0.3403
Analysis of Variance Table
  Df Sum Sq Mean Sq F value Pr(>F)
lifeExp 1 5.638e+10 5.638e+10 879.6 3.566e-156
Residuals 1702 1.091e+11 64100173 NA NA

1.2 Africa

1.3 Asia

1.4 Europe

1.5 Americas

1.6 Oceania

1.7 Specific Countries

lifeExp pop gdpPercap
lifeExp 1.0000000 0.0649554 0.5837062
pop 0.0649554 1.0000000 -0.0255996
gdpPercap 0.5837062 -0.0255996 1.0000000
## 
##  Pearson's product-moment correlation
## 
## data:  gapminder$pop and gapminder$gdpPercap
## t = -1.0565, df = 1702, p-value = 0.2909
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.07299723  0.02191346
## sample estimates:
##         cor 
## -0.02559958
##  Factor w/ 12 levels "1952","1957",..: 1 2 3 4 5 6 7 8 9 10 ...

Here we can see that we have 142 countries, 5 continents and 12 times the observation was taken from 1952 to 2007. Observations were taken after every five years

Now let’s compare the situations back in 1952 and recently in 2007. We should filter the data first for year 1952. Let’s have a quick look at it before visualising.

## # A tibble: 142 x 6
##        country continent  year lifeExp      pop  gdpPercap
##         <fctr>    <fctr> <int>   <dbl>    <int>      <dbl>
##  1 Afghanistan      Asia  1952  28.801  8425333   779.4453
##  2     Albania    Europe  1952  55.230  1282697  1601.0561
##  3     Algeria    Africa  1952  43.077  9279525  2449.0082
##  4      Angola    Africa  1952  30.015  4232095  3520.6103
##  5   Argentina  Americas  1952  62.485 17876956  5911.3151
##  6   Australia   Oceania  1952  69.120  8691212 10039.5956
##  7     Austria    Europe  1952  66.800  6927772  6137.0765
##  8     Bahrain      Asia  1952  50.939   120447  9867.0848
##  9  Bangladesh      Asia  1952  37.484 46886859   684.2442
## 10     Belgium    Europe  1952  68.000  8730405  8343.1051
## # ... with 132 more rows

Now let’s have a look at the visualizations. We are interested in a GDP per capita vs life expectancy to evaluate the performance of various countries and continents. Rich and healthy countries will be present in the top right corner of the plot while the poor and unhealthy countries will be present on the bottom left.

Here we saw that most of the countries were poor and unhealthy especially the countries from Africa and Asia. North American countries were leading in better GDP per capita and higher life expectancy. In next plot we will connect the countries with each other based upon the similar continent.

Here we can see Kuwait had huge variations in their GDP because of hard conditions in the country in 80’s. It faced a stock market crash in that decade followed by huge drop in oil prices which accounts a major part of economy. Before it could recover from these two events it also faced Gulf war in the middle east.

1.8 Fast forward to 2007

Lets filter the data for year 2007 and visualise it for analysis.

## # A tibble: 142 x 6
##        country continent  year lifeExp       pop  gdpPercap
##         <fctr>    <fctr> <int>   <dbl>     <int>      <dbl>
##  1 Afghanistan      Asia  2007  43.828  31889923   974.5803
##  2     Albania    Europe  2007  76.423   3600523  5937.0295
##  3     Algeria    Africa  2007  72.301  33333216  6223.3675
##  4      Angola    Africa  2007  42.731  12420476  4797.2313
##  5   Argentina  Americas  2007  75.320  40301927 12779.3796
##  6   Australia   Oceania  2007  81.235  20434176 34435.3674
##  7     Austria    Europe  2007  79.829   8199783 36126.4927
##  8     Bahrain      Asia  2007  75.635    708573 29796.0483
##  9  Bangladesh      Asia  2007  64.062 150448339  1391.2538
## 10     Belgium    Europe  2007  79.441  10392226 33692.6051
## # ... with 132 more rows

1.9 Insights

  1. After the analysis we can clearly see that most countries from Europr have made progress in terms of gdpPercap and lifeExp both, whereas most of the countries from Africa are still having lower gdpPercap and lifeExp.

  2. Populous countries including India and China have shown a huge increase in lifeExp but with minor improvement in gdpPercap.

## # A tibble: 12 x 6
## # Groups:   year [12]
##    country continent  year lifeExp     pop gdpPercap
##     <fctr>    <fctr> <int>   <dbl>   <int>     <dbl>
##  1  Kuwait      Asia  1952  55.565  160000 108382.35
##  2  Kuwait      Asia  1957  58.033  212846 113523.13
##  3  Kuwait      Asia  1962  60.470  358266  95458.11
##  4  Kuwait      Asia  1967  64.624  575003  80894.88
##  5  Kuwait      Asia  1972  67.712  841934 109347.87
##  6  Kuwait      Asia  1977  69.343 1140357  59265.48
##  7  Kuwait      Asia  1982  71.309 1497494  31354.04
##  8  Kuwait      Asia  1987  74.174 1891487  28118.43
##  9  Kuwait      Asia  1992  75.190 1418095  34932.92
## 10  Kuwait      Asia  1997  76.156 1765345  40300.62
## 11  Kuwait      Asia  2002  76.904 2111561  35110.11
## 12  Kuwait      Asia  2007  77.588 2505559  47306.99

## # A tibble: 300 x 6
##          country continent  year lifeExp       pop gdpPercap
##           <fctr>    <fctr> <int>   <dbl>     <int>     <dbl>
##  1        Canada  Americas  2007  80.653  33390141 36319.235
##  2        Canada  Americas  2002  79.770  31902268 33328.965
##  3    Costa Rica  Americas  2007  78.782   4133884  9645.061
##  4   Puerto Rico  Americas  2007  78.746   3942491 19328.709
##  5        Canada  Americas  1997  78.610  30305843 28954.926
##  6         Chile  Americas  2007  78.553  16284741 13171.639
##  7          Cuba  Americas  2007  78.273  11416987  8948.103
##  8 United States  Americas  2007  78.242 301139947 42951.653
##  9    Costa Rica  Americas  2002  78.123   3834934  7723.447
## 10        Canada  Americas  1992  77.950  28523502 26342.884
## # ... with 290 more rows

1.10 Clusters

##    y
## x   Africa Americas Asia Europe Oceania
##   1    624      300  374    360      24
##   2      0        0   22      0       0

##    y
## x   Africa Americas Asia Europe Oceania
##   1     52       25   31     30       2
##   2      0        0    2      0       0

##    y
## x   Africa Americas Asia Europe Oceania
##   1     52       25   31     30       2
##   2      0        0    2      0       0
Africa Americas Asia Europe Oceania
52 25 31 30 2
0 0 2 0 0